Abstract Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance

نویسندگان

  • Daniel Zinn
  • Bertram Ludäscher
چکیده

Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance Daniel Zinn Bertram Ludäscher {dzinn,ludaesch}@ucdavis.edu Abstract. Provenance graphs capture flow and dependency information recorded during scientific workflow runs, which can be used subsequently to interpret, validate, and debug workflow results. In this paper, we propose a new concept, called abstract provenance graph (APG). APGs are created via static analysis of a configured workflow W and input data schema, i.e., before the workflow is actually executed. They summarize all possible provenance graphs the workflow W can create with input data of type τ, that is, for each input v ∈ τ there exists a graph homomorphism Hv between the concrete and abstract provenance graph. APGs are helpful during workflow construction since (1) they make certain workflow design-bugs (e.g., selecting none or wrong input data for the actors) easy to spot; and (2) show the evolution of the overall data organization of a workflow. Moreover, after workflows have been run, APGs can be used to validate concrete provenance graphs. A more detailed version of this work is available as [12]. 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Explicit Provenance Management in RDF/S Graphs

The notion of RDF Named Graphs has been proposed in order to assign provenance information to data described using RDF triples. In this paper, we argue that named graphs alone cannot capture provenance information in the presence of RDFS reasoning and updates. In order to address this problem, we introduce the notion of RDF/S Graphsets: a graphset is associated with a set of RDF named graphs an...

متن کامل

Database Support for Exploring Scientific Workflow Provenance Graphs

Provenance graphs generated from real-world scientific workflows often contain large numbers of nodes and edges denoting various types of provenance information. A standard approach used by workflow systems is to visually present provenance information by displaying an entire (static) provenance graph. This approach makes it difficult for users to find relevant information and to explore and an...

متن کامل

OPQL: Querying scientific workflow provenance at the graph level

Article history: Received 21 December 2011 Received in revised form 30 August 2013 Accepted 31 August 2013 Available online xxxx Provenance has become increasingly important in scientific workflows to understand, verify, and reproduce the result of scientific data analysis. Most existing systems store provenance data in provenance stores with proprietary provenance data models and conduct query...

متن کامل

A Comprehensive Model for Provenance

In this paper, we propose a provenance model able to represent the provenance of any data object captured at any abstraction layer (workflow/process/OS) and present an abstract schema of the model. The expressive nature of the model makes it potential to be utilized in real world data processing systems.

متن کامل

What’s in a name? Exploiting URIs to enrich provenance explanations in plain English

Provenance allows decision-makers to evaluate the importance of pieces of data. PROV is the standardised model of provenance for use on the web, particularly suited for situations where data is generated by systems under distributed control, such as in coalition operations. If human decision-makers are to make effective use of provenance data, they need to understand it, and this work establish...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010